920 research outputs found

    Memory-based vocalization of Arabic

    Get PDF
    The problem of vocalization, or diacritization, is essential to many tasks in Arabic NLP. Arabic is generally written without the short vowels, which leads to one written form having several pronunciations with each pronunciation carrying its own meaning(s). In the experiments reported here, we define vocalization as a classification problem in which we decide for each character in the unvocalized word whether it is followed by a short vowel. We investigate the importance of different types of context. Our results show that the combination of using memory-based learning with only a word internal context leads to a word error rate of 6.64%. If a lexical context is added, the results deteriorate slowly

    Parallel Database Architectures: A Simulation Study.

    Get PDF
    Parallel database systems are gaining popularity as a solution that provides scalability in large and growing databases. A parallel database system is a DBS which exploits multiprocessing systems to improve performance. Parallel database computers can be classified into three categories: shared memory, shared disk, and shared nothing. In shared memory, all resources, including main memory and disk units, are shared among several processors. In shared disk, a group of processors share a common pool of disks, but each processor has its own private main memory. In the shared-nothing system, every processor has its own memory and disk unit, that is, except for communication links, no resources are shared among the processors. In this work, we· compare the performance of the three architecture classes. Simulation models for the various architectures are introduced. Using these models, a number of experiments were conducted to compare the system performance of these architectures under different workloads and transaction models. The aim of this work is to provide a tool for evaluating the different architectures and their appropriateness for a specific database application

    Multicast Services for Multimedia Collaborative Applications

    Get PDF
    This work aims at providing multicast services for multimedia collaborative applications over large inter-networks such as the Internet. Multimedia collaborative applications are typically of small group size, slow group membership dynamics, and awareness of participants\u27 identities and locations. Moreover, they usually consist of several components such as audio, video, shared whiteboard, and single user application sharing engines that collectively help make the collaboration session successful. Each of these components has its demands from the communication layer that may differ from one component to another. This dissertation identifies the overall characteristics of multimedia collaborative applications and their individual components. It also determines the service requirements of the various components from the communication layer. Based on the analysis done in the thesis, new techniques of multicast services that are more suitable for multimedia collaborative applications are introduced. In particular, the focus will be on multicast address management and connection control, routing, congestion and flow control, and error control. First, we investigate multicast address management and connection control and provide a new technique for address management based on address space partitioning. Second, we study the problem of multicast routing and introduce a new approach that fits the real time nature of multimedia applications. Third, we explore the problem of congestion and flow control and introduce a new mechanism that takes into consideration the heterogeneity within the network and within the processing capabilities of the end systems. Last, we exploit the problem of error control and present a solution that supports various levels of error control to the different components within the collaboration session. We present analytic as well as simulation studies to evaluate our work, which show that our techniques outperform previous ones

    Small sample confidence bands for the survival functions under proportional hazards model

    Get PDF
    In this work, a saddlepoint-based method is developed for generating small sample confidence bands for the population survival function from the Kaplan-Meier (KM), the product limit (PL), and Abdushukurov-Cheng-Lin (ACL) survival function estimators, under the proportional hazards model. In the process the exact distribution of these estimators is derived and developed mid-population tolerance bands for said estimators. The proposed saddlepoint method depends upon the Mellin transform of the zero-truncated survival estimator which is derived for the KM, PL, and ACL estimators. These transforms are inverted via saddlepoint approximations to yield highly accurate approximations to the cumulative distribution functions of the respective cumulative hazard function estimators and these distribution functions are then inverted to produce saddlepoint confidence bands. The saddlepoint confidence bands for the KM, PL and ACL estimators is compared with those obtained from competing large sample methods as well as those obtained from the exact distribution. In the simulation studies it is found that the saddlepoint confidence bands are very close to the confidence bands derived from the exact distribution, while being much easier to compute, and outperform the competing large sample methods in terms of coverage probability --Abstract, page iii

    Industrial energy efficiency optimisation through cogeneration using biomass

    Get PDF

    ORTHOGRAPHIC ENRICHMENT FOR ARABIC GRAMMATICAL ANALYSIS

    Get PDF
    Thesis (Ph.D.) - Indiana University, Linguistics, 2010The Arabic orthography is problematic in two ways: (1) it lacks the short vowels, and this leads to ambiguity as the same orthographic form can be pronounced in many different ways each of which can have its own grammatical category, and (2) the Arabic word may contain several units like pronouns, conjunctions, articles and prepositions without an intervening white space. These two problems lead to difficulties in the automatic processing of Arabic. The thesis proposes a pre-processing scheme that applies word segmentation and word vocalization for the purpose of grammatical analysis: part of speech tagging and parsing. The thesis examines the impact of human-produced vocalization and segmentation on the grammatical analysis of Arabic, then applies a pipeline of automatic vocalization and segmentation for the purpose of Arabic part of speech tagging. The pipeline is then used, along with the POS tags produced, for the purpose of dependency parsing, which produces grammatical relations between the words in a sentence. The study uses the memory-based algorithm for vocalization, segmentation, and part of speech tagging, and the natural language parser MaltParser for dependency parsing. The thesis represents the first approach to the processing of real-world Arabic, and has found that through the correct choice of features and algorithms, the need for pre-processing for grammatical analysis can be minimized
    corecore